We propose techniques for processing SPARQL queries over a large RDF graph ina distributed environment. We adopt a "partial evaluation and assembly"framework. Answering a SPARQL query Q is equivalent to finding subgraph matchesof the query graph Q over RDF graph G. Based on properties of subgraph matchingover a distributed graph, we introduce local partial match as partial answersin each fragment of RDF graph G. For assembly, we propose two methods:centralized and distributed assembly. We analyze our algorithms from boththeoretically and experimentally. Extensive experiments over both real andbenchmark RDF repositories of billions of triples confirm that our method issuperior to the state-of-the-art methods in both the system's performance andscalability.
展开▼